Controllable data synthesis method for grammatical error correction
نویسندگان
چکیده
Due to the lack of parallel data in current grammatical error correction (GEC) task, models based on sequence framework cannot be adequately trained obtain higher performance. We propose two synthesis methods which can control rate and ratio types synthetic data. The first approach is corrupt each word monolingual corpus with a fixed probability, including replacement, insertion deletion. Another train generation further filtering decoding results models. experiments different show that 40% same improve model performance better. Finally, we synthesize about 100 million achieve comparable as state art, uses twice much use.
منابع مشابه
Grammatical Error Correction
Grammatical error correction (GEC) is the task of automatically correcting grammatical errors in written text. Earlier attempts to grammatical error correction involve rule-based and classifier approaches which are limited to correcting only some particular type of errors in a sentence. As sentences may contain multiple errors of different types, a practical error correction system should be ab...
متن کاملGrammatical Error Correction of English as Foreign Language Learners
This study aimed to discover the insight of error correction by implementing two correction systems on three Iranian university students. The three students were invited to write four in-class essays throughout the semester, in which their verb errors and individual-selected errors were corrected using the Code Correction System and the Individual Correction System. At the end of the study, the...
متن کاملSystem Combination for Grammatical Error Correction
Different approaches to high-quality grammatical error correction have been proposed recently, many of which have their own strengths and weaknesses. Most of these approaches are based on classification or statistical machine translation (SMT). In this paper, we propose to combine the output from a classification-based system and an SMT-based system to improve the correction quality. We adopt t...
متن کاملGenerating artificial errors for grammatical error correction
This paper explores the generation of artificial errors for correcting grammatical mistakes made by learners of English as a second language. Artificial errors are injected into a set of error-free sentences in a probabilistic manner using statistics from a corpus. Unlike previous approaches, we use linguistic information to derive error generation probabilities and build corpora to correct sev...
متن کاملMemory-based Grammatical Error Correction
We describe the ’TILB’ team entry for the CONLL-2013 Shared Task. Our system consists of five memory-based classifiers that generate correction suggestions for center positions in small text windows of two words to the left and to the right. Trained on the Google Web 1T corpus, the first two classifiers determine the presence of a determiner or a preposition between all words in a text. The sec...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Frontiers of Computer Science
سال: 2021
ISSN: ['1673-7350', '1673-7466']
DOI: https://doi.org/10.1007/s11704-020-0286-4